Term Clusters Evaluation by Montecarlo Sampling
نویسنده
چکیده
Huge amount of textual information available in firms and institutions triggers the need for robust textual data analysis systems. A new field called text-mining has the goal of discovering hidden information and knowledge structuring in texts. Statistical methods coupled with natural language processing can give some answers to this kind of problems. We have developed a module of term clustering called Galex (Graph Analyzer for LEXicometry). This paper considers random corpora used to compare homogeneity parameters (precision, recall, extraction probability from a set of categories) with clusters obtained from a real corpus and a hand-made hierarchy related to the domain of the corpus.
منابع مشابه
Generic BRDF Sampling - A Sampling Method for Global Illumination
This paper introduces a new BRDF sampling method with reduced variance, which is based on a hierarchical adaptive parameterless PDF. This PDF is based also on rejection sampling with a bounded average number of trials, even in regions where the BRDF does exhibit high variations. Our algorithm works in an appropiate way with both physical and analytical reflectance models. Reflected directions a...
متن کاملAn Importance Sampling Method for Arbitrary BRDFs
This paper introduces a new BRDF sampling method with reduced variance, which is based on a hierarchical adaptive PDF. This PDF also is based on rejection sampling with a bounded average number of trials, even in regions where the BRDF exhibits high variations. Our algorithm works in an appropiate way with both physical, analytical and measured reflectance models. Reflected directions are sampl...
متن کاملOverelaxed hit-and-run Monte Carlo for the uniform sampling of convex bodies with applications in metabolic network analysis
The uniform sampling of convex regions in high dimension is an important computational issue, from both theoretical and applied point of view. The hit-and-run montecarlo algorithms are the most efficient methods known to perform it and one of their bottlenecks relies in the difficulty of escaping from tight corners in high dimension. Inspired by optimized montecarlo methods used in statistical ...
متن کاملIterative Turbo Decoding Using Gibbs Sampling
This paper discusses an iterative multiuser receiver for codedivision multiple access (CDMA) with forward error control coding. The receiver is derived from the maximum aposteriori (MAP) criterion for the joint received signal. A major drawback of the MAP receiver is its heavy computational cost that grows exponentially with the number of users. An alternative solution is proposed here based on...
متن کاملCost-Driven Multiple Importance Sampling for Monte-Carlo Rendering
The global illumination or transport problems can also be considered as a sequence of integrals, while its MonteCarlo solutions as different sampling techniques. Multiple importance sampling takes advantage of different sampling strategies and combines the results obtained with them. In this paper we propose the combination of very different global illumination algorithms in a way that their st...
متن کامل